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Abstract 

The purpose of this note is to show that the solution to the Kan- 
torovich optimal transportation problem is supported on a Lipschitz 
manifold, provided the cost is with non-singular mixed second 
derivative. We use this result to provide a simple proof that solu- 
tions to Monge's optimal transportation problem satisfy a change of 
variables equation almost everywhere. 



1 Introduction 

Given Borel probability measures /i"*" and /i~ on smooth n-dimensional man- 
ifolds M+ and M~ respectively and a cost function c : M"*" x M~ — )■ R, 
the Kantorovich problem is to pair the two measures as efficiently as pos- 
sible relative to c. A precise formulation requires some notation. For a 
measure 7 on x M~, we define the marginals of 7 to be its push for- 
wards under the canonical projections 7r+ and tt"; put another way, the 
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marginals are measures on M"*" and M respectively given by the formu- 
lae 7r+7(A) = 7(A X M-) and vr^7(S) = 7(M+ x B) for all Borel sets 
A C , B C M~ . The Kantorovich problem is then to minimize the func- 
tional 

/ c{x,y)dj{x,y) (1) 

among all measures 7 on M"*" x M~ whose marginals are ir^j = jS^ and 

71^7 = 

Under fairly weak conditions, it is straightforward to show that a solu- 
tion to this problem exists. In this paper, we study what can be said about 
that solution under a certain non- degeneracy condition on the cost func- 
tion, which was originally introduced in an economic context by McAfee and 
McMillan [25] and later rediscovered by Ma, Trudinger and Wang [26]; in 
the terminology of Ma, Trudinger and Wang, it is also known as the (A2) 
condition. 

In what follows, D'^yC{xo,yo) will denote the n by n matrix of mixed 
second order partial derivatives of the function c at the point (xo,yo) G 
M+ X M-; its (z,i)th entry is ^fi^{xo,yo). 

Definition 1.1. Assume c & C'^{M^ x M~) . We say that c is non- degenerate 
at a point (xo,yo) ^ x if DlyC{xo,yo) is nonsingular; that is if 
det(DlyC{xo,yo)) ^ 0. 

For a probability measure 7 on M"*" x M~ we will denote by spt(7) the 
support of 7; that is, the smallest closed set S C M+ x M' such that 
7(5) = 1. 

Our main result is: 

Theorem 1.2. Suppose c G C^(M+ x M~) and /i^ and /i~ are compactly 
supported; let j be a solution of the Kantorovich problem. Suppose {xq, yo) G 
spt{'-f) and c is non- degenerate at {xQ,yo). Then there is a neighbourhood 
N of (xo,?/o) such that N n spt^j) is contained in an n-dimensional Lips- 
chitz submanifold. In particular, if D^yC is nonsingular everywhere, sptj is 
contained in an n-dimensional Lipschitz submanifold. 

The proof of this theorem is based on an idea of Minty ^30j, which was 
also used by Alberti and Ambrosio to show that the graph of any monotone 
function T : R" — )■ R" is contained in a Lipschitz graph over the diagonal 
A = {«=^:(x,y)GR"xR"} 0]. 
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The non-degeneracy condition can be viewed as a linearized version of 
the twist condition, which asserts that the mapping y G M~ i — )■ D^c^x, y) is 
injective. Under suitable regularity conditions on the marginals. Levin [2T| 
showed that the twist condition ensures that the solution to the Kantorovich 
problem is concentrated on the graph of a function and is therefore unique; 
see also Gangbo [I7|. For the past two decades, the regularity of these maps 
has been an active area of investigation. Regularity results were proven 
for the quadratic cost function by Caffarelli [9] [10] [11], Delanoe [ll][l5] and 
Urbas [36j and for another special cost function by Wang [37]. These were 
then generalized by Ma, Trudinger and Wang [26], who discovered a fourth 
order differential condition on the cost function that ensures the optimal map 
is smooth, provided the marginals are sufficiently regular [25] [M] • Our results 
assert that something can be said about the smoothness of the support even 
without these strong conditions on the cost and the marginals, provided that 
one is willing to view the support as a submanifold rather than a graph. 

In one dimension, non-degeneracy implies twistedness, as was noted by 
many authors, including Spence [M] and Mirrlees [3T], in the economics 
literature; see also [22] • In higher dimensions, this is no longer true; the 
non- degeneracy condition will imply that the map y G M~ i — )■ Dxc{x, y) 
is locally injective but not necessarily globally. Non- degeneracy was a hy- 
pothesis in the smoothness proof in [26] , but does not seem to have received 
much attention in higher dimensions before then. While our result demon- 
strates that the non-degeneracy condition is enough to ensure that solutions 
still have certain regularity properties, we will show by example that the 
uniqueness result that follows from twistedness can fail for non-degenerate 
costs which are not twisted. The twist condition is asymmetric in x and y; 
that is, there are cost functions for which the map y G i — y Dj.c{x, y) is 
injective but x G M+ i — > Dyc{x, y) is not. However, since {D^yC)^ = D^^c 
the non- degeneracy condition is certainly symmetric in x and y. In view of 
this, it is not surprising that the twist condition can only be used to show 
solutions are concentrated on the graphs of functions of y over x whereas the 
non- degeneracy condition implies solution are concentrated on n-dimensional 
submanifolds, a result that does not favour either variable over the other. 

Smooth optimal maps solve certain Monge- Ampere type equations. Typ- 
ically, an optimal map will be differentiable almost everywhere, but may not 
be smooth. It has proven useful to know when non-smooth optimal maps 
solve the corresponding equations almost everywhere. Formally, the link be- 
tween optimal transportation and these equations was observed by Brenier 
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[H], then Gangbo and McCann [TS], and they were studied in detail by Ma, 
Trudinger and Wang An important step in showing that an optimal 

map solves a Monge- Ampere type equation is first showing that it solves the 
Jacobian — or change of variables — equation. An injective Lipschitz func- 
tion satisfies the change of variables formula almost everywhere, so some sort 
of Lipschitz rectifiability for the graphs of optimal maps is a useful tool in 
resolving this question. As an application of Theorem 1.2, we provide a sim- 
ple proof that optimal maps satisfy the change of variables formula almost 
everywhere. 

This work is related to another interesting line of research. A measure 
7 on the product M"*" x M~ is called simplicial if it is extremal among the 
convex set of all measures which share its marginals. There are a number 
of results describing simplicial measures and their supports [T6] [22] [7] [20] [3] . 
One consequence is that the support of simplicial measures are in some sense 
small; in particular, the support of a simplicial measure on [0,1] x [0,1] 
must have two-dimensional Lebesgue measure zero [22] [20]. However, any 
measure supported on the graph of a function is simplicial and it is known 
that there exist functions whose graphs have Hausdorff measure 2 — e, for 
any e > [1]. For any cost, the Kantorovich functional is linear and is 
hence minimized by some simplicial measure. Conversely, any simplicial 
measure is the solution to a Kantorovich problem for some continuous cost 
function, and so by the remarks above there are continuous cost functions 
whose optimizers are supported on sets of Hausdorff dimension 2 — e. On 
the other hand, an immediate consequence of our result is that the support 
of optimizers of Kantorovich problems with non-degenerate costs have 
Hausdorff dimension at most n. 

The result of Ma, Trudinger and Wang proving smoothness of the opti- 
mal map under certain conditions immediately implies that the support of 
the optimizer has Hausdorff dimension n; however, the proof of this result 
requires that the marginals be smooth. Under the same assumptions on 
the cost functions but weaker regularity conditions on the marginals, Loeper 
and Liu [23] have demonstrated that the optimal map is Holder contin- 
uous for some Holder constant < a < 1. It is worth noting that there are 
examples of functions on R" p]| which are Holder continuous with exponent 
a but whose graphs have Hausdorff dimension n + l — a, so the latter results 
do not imply that the Hausdorff dimension of the optimizer must be n. 

In the second section of this manuscript we prove Theorem 1.2 while 
Section 3 is devoted to discussion and examples. In the final section, we use 
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Theorem 1.2 to provide a simple proof that optimal maps satisfy a prescribed 
Jacobian equation almost everywhere. 

We are pleased to acknowledge that our interest in this topic was stim- 
ulated in part by a fruitful discussion between one of the authors and Ivar 
Ekeland. 

2 Lipschitz Rectifiability of Optimal Trans- 
portation Plans 

We now prove Theorem 1.2. Note that 7 minimizes the Kantorovich func- 
tional if and only if it maximizes the corresponding functional for b{x, y) = 
—c{x,y). To simplify the computation, we consider 7 that maximizes b. 

Our proof relies on the b-monotonicity of the supports of optimal mea- 
sures: 

Definition 2.1. A subset S of xM^ is b-monotone if all {xQ,yQ), {xi,yi) G 
S satisfy b{xo,yo) + b{xi,yi) > b{xo,yi) + b{xi,yo). 

It is well known that the support of any optimizer is 6-monotone |32], 
provided that the cost is continuous and the marginals are compactly sup- 
ported. The reason for this is intuitively clear; if 6(xo,2/o) + K^i^Vi) > 
K^o^ Vi) + K^i^Vo) then we could move some mass from {xq, yo) and (xi, yi) 
to {xq, yi) and {xi, y^) without changing the marginals of 7 and thus increase 
the integral of b. 

The strategy of our proof is to change coordinates so that locally b{x, y) = 
X ■ y, modulo a small perturbation. We then switch to diagonal coordinates 
u = X + y,v = X — y and show that the monotonicity condition becomes 
a Lipschitz condition for f as a function of u. This trick dates back to 
Minty who used it to study monotone operators on Hilbert spaces [5U] ; more 
recently, Alberti and Ambrosio used it to investigate the fine properties of 
monotone functions on R" [4J. 

We are now ready to prove Theorem 1.2: 

Proof. Choose (xo,2/o) in the support of 7. Changing coordinates in a neigh- 
bourhood of yo yields Dlyb^xo^yo) = I without loss of generality. We then 
have b{x,y) = x ■ y + G{x,y), where D^yG — t- as {x,y) — )■ {xo,yo). Set 
uy/2 = X + y and v^/2 = y — x. Given e > 0, choose a convex neighbour- 
hood of {xQ,yQ) such that GH < e on A. We will show that 7 fl A 
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is contained in a Lipschitz graph of v over hence, u and v serve as local 
coordinates for our submanifold. Take {x.y) and {x',y') G fl spt'-f. Then, 
by 6-monotonicity, we have b{x, y) + 6(x', y') > b{x, y') + 6(x', y), hence 

x-y + G{x,y)+x' -y' + G{x',y') 
>x-y' + G{x, y')+x' ■y + G{x' , y) . 

Setting Ax — x' — x, Ay = y' — y, Au — u' — u, Av — v' — v, and rewriting 
yields 

{Ax) ■ {Ay) + {Ax) ■ [ [ DlyG[x + sAx,y + tAy]{Ay)dsdt>0 (2) 
Jo Jo 

which simplifies to: Ax ■ Ay > — e|Aa;||Ay|. 

Observe that AyV2 = Au + Av and Ax^/2 ^ Au - Av. Now, 

|Am|2 - |Av|2 = 2{Ax) ■ {Ay) 

> -2e|Ax||A?/| 

= -e\Au - Av\\Au + Av\ 

> -e[\Au\^ + \Av\^] 

The last inequality follows by squaring the absolute values of each side and 
expanding the first term. Rearranging yields (l + e)|A«p > (1 — e)|Awp, the 
desired result. 

Note that v may not be everywhere defined; that is, for certain values of 
u there may be no corresponding v in spt{'y). However, the function v{u) can 
be extended by Kirzbraun's theorem and hence we can conclude that spt{'y) 
is contained in the graph of a Lipschitz function of v over u. 

□ 

Remark 2.1. Note that the only property of optimal transportation plans 
used in the proof is b-monotonicity, so we have actually proven that any b- 
monotone subset of M"*" x M~ is contained in an n- dimensional Lipschitz 
submanifold, provided b is non-degenerate. 



3 Discussion and Examples 

For twisted costs, one can show that spt{j) is concentrated on the graph of 
a function, provided the marginal does not charge sets whose dimension 
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is less than or equal to n — 1[T7] [2T] [3] however, this can fail 

if fi'^ charges small sets. On the other hand, notice that our proof did not 
require any regularity hypotheses on the marginals. 

In the example below, we exhibit a non-degenerate cost which is not 
twisted. We use this example to illustrate how, in this setting, solutions 
may be supported on submanifolds which are are not necessarily graphs. In 
addition, we show that these solutions may not be unique. We can view this 
example as expressing an optimal transportation problem on a right circular 
cylinder via its universal cover, which is R^. The non-twistedness of the 
cost and non-uniqueness of the solution arise because different points in the 
universal cover correspond to the same point in the cylinder and are therefore 
indistinguishable by our cost function. In fact, if we expressed the problem 
on the cylinder, we would have a twisted cost function and therefore a unique 
solution. 

Example 3.1. Let M± = and c{x,y) = e^^^+y^ cos{x2 - ?/2) + ^ + 
Then Dxc{x,y) = {e^^^^^ cos{x2 — 1/2) + e^^^ ,—e^^^y^sin{x2 — 1/2)), so y E 
M~ I — y Dxc{x, y) is not injective and c is not twisted. However, note that 
DlyC{x, y) = 

e^^+y^cos{x2 — 2/2) e^i"'"^ism(x2 — 2/2) 
—e^^~^y^sin{x2 — 2/2) e^^~^'y^cos{x2 — 2/2) 

Therefore, detDlyC{x,y) = e^*^^^"*"^^^ > for all {x,y), so c is non- 
degenerate. Optimal measures for c, then, must be supported on 2 -dimensional 
Lipschitz submanifolds, but we will now exhibit an optimal measure whose 
support is not contained in the graph of a function. 

Now let M be the union of the three graphs: 

Gi : 2/1 = Xi,y2 = X2 + 'K (3) 
G2 : 2/1 = xi, 2/2 = a^2 + 37r (4) 
G^: yi = Xi,y2 = X2 + ^'K (5) 

Clearly, M is a smooth 2-d submanifold but not a graph. However, c{x, y) > 
_QX^+y\ _|_ _|_ ^ (e ^-^'"^) qj^^ /^Q^g equality on M . Therefore, any 
probability measure whose support is concentrated on M is optimal for its 
marginals. 

^lii fact, this condition on the regularity of /i+ has recently been sharpened [19]. 



7 



We now show that optimal measures supported on M may not be unique. 
Let S = {((xi,X2), (?/i,?/2))|0 < < 1,0 < X2 < 47r}. Note that 

M n 5 = (Gi n 5) u (G2 n 5) u (G3 n S). 

consists of 3, flat 2-d regions. Let 7 be uniform measure on these regions. 
Now, let Ti be uniform measure on the the first half of Gi fl S ; that is, on 

Gin{{{x,,X2),{yi,y2))\0<xi < l,0<x2<2n}. 

Let 73 be uniform measure on the the second half of G3 fl S, or 

Can {((xi,X2),(yi,2/2))|0 < xi < l,27r < X2 < 47r}. 

Take 72 to be twice uniform measure on G2r\S and set 7 = 71+72 +73- Then 7 
and^ share the same marginals and are both optimal measures. Furthermore, 
any convex combination ^7 + (1 — t)^ will also share the same marginals and 
will be optimal as well. 

The next example is similar in that the cost function is non- degenerate 
but not twisted. However, this cost would be twisted if we exchanged the 
roles of X and y. This demonstrates that, unlike non- degeneracy, the twist 
condition is not symmetric in x and y. For this cost function, solutions will 
be unique as long as the second marginal does not charge small sets. 

Example 3.2. Let = and c{x,y) = —{xicos{yi) + X2sin{yi))e^^ + 
_l_ ^1+^2 ^ ]\iote that detDlyC{x,y) = — e^^^ < 0, so c is non- degenerate. 
However, Dxc{x,y) = (— cos(?/i)e^2 + Xi, — sm(yi)e^2 + X2), so y E M~ 1 — > 
Dxc{x, y) is not infective and c is not twisted. On the other hand, Dyc{x, y) = 
{{xisin{yi) + X2Cos{yi))e^^ , —{xicos{yi) + X2sm(?/i))e^^ + e^^^) and so x E 
I — )• Dyc{x, y) is injective. This implies that solutions are supported 
on graphs of x over y but that these graphs are not necessarily invertible. 

In fact, c{x, y) > > Q, where equality holds if and only if 

cos(yi) = — — — r, sin(yi) = — — — r, and (x? + x,)^ = e^^. This set of 

equality is a non-invertible graph of x over y; any measure whose support 
is contained in this graph is optimal for its marginals. Note that as any 
minimizer for this problem must be supported on this graph, the solution is 
unique 
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Remark 3.3. For twisted costs with regular marginals, any solution is con- 
centrated on the graph of a particular function JEBj/- It is not hard to show 
that at most one measure with prescribed marginals can be supported on such 
a graph; hence, uniqueness of the optimizer follows immediately. 

While our result asserts that for non- degenerate costs the solution con- 
centrates on some n-dimensional Lipschitz submanifold, the proof says little 
more about the submanifold itself. In contrast to the twisted setting, then, 
our result cannot be used to deduce a uniqueness argument. Furthermore, as 
the example above shows, even if we do know the support of the optimizer 
explicitly, solutions may not be unique if this support is not concentrated on 
the graph of a function. 

Theorem 1.2 also says something about problems where D^yC is allowed 
to be singular, but where the gradient of its determinant is non- zero at the 
singular points. In this case, the implicit function theorem implies that 
the set where Dl„c is singular has Hausdorff dimension 2n — 1. Theorem 
1.2 is valid wherever D^yC is nonsingular, so that the optimal measure is 
concentrated on the union of a smooth 2n — 1 dimensional set and an n 
dimensional Lipschitz submanifold. For example, when n = 1, this shows 
that the support of the optimal measure is 1 dimensional. 

4 A Jacobian equation 

We now provide a simple proof that an optimal map satisfies a prescribed 
Jacobian equation almost everywhere. This result was originally proven for 
the quadratic cost in R" by McCann [2Sj, and for the quadratic cost on a 
Riemannian manifold by Cordero-Erasquin, McCann and Schmuckenschlager 
[13] ■ Cordero-Erasquin generalized this approach to deal with strictly convex 
costs on R" [12]; see also [2j. It was observed by Ambrosio, Gigli and Savare 
that this can be deduced from results in [5] and [6] when the optimal map is 
approximately different iable, which is true even for some non-smooth costs. 
Our method works only when the cost is and non-degenerate, but has 
the advantage of a simpler proof, relying only on the area/coarea formula for 
Lipschitz functions. 

For a Jacobian equation to make sense, the solution must be concentrated 
on the graph of a function, and that function must be differentiable in some 
sense, at least almost everywhere. A twisted cost suffices to ensure the first 
condition. The second follows from the smoothness and non- degeneracy of 
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c. Recall that for a twisted cost the optimal map has the form T{x) = c- 
expx{Du{x))] as c-expx{-) is the inverse of y i — Dxc{x,y), its differentabiliy 
follows from the non-degeneracy of c and the inverse function theorem. The 
almost everywhere differentiability of Du{x) (or, equivalently, the almost 
everywhere twice differentiability of u) follows from smoothness of c; u 
takes the form u{x) =infy(c(x, y) — v{y)) for some function v{y) and is hence 
semi-concave [IB]. In the present context, we need only the weaker condition 
that the optimal map is continuous almost everywhere; its differentiability 
will follow from Theorem 11.21 

Proposition 4.1. Assume that the cost is non- degenerate and that an op- 
timizer 7 is supported on the graph of some function T : dom{T) — )■ M~ 
which is injective and continuous when restricted to a set dom{T) C of 
full Lebesgue measure. Suppose that the marginals are absolutely continuous 
with respect to volume; set d^i^ = f^{x)dx and dfi~ = f^{y)dy. Then, for 
almost every x, f~^{x) = \ detDT{x)\f^(T{x)). 

Proof. Choose a point x where T is continuous and a neighbourhood of 
T{x) such that for = T^^{U~), the part of the optimal graph contained in 
X lies in a Lipschitz graph v = G{u) over the diagonal A = {u = : 

(x, y) G X U~}, after a change of coordinates. Now x = and y = 
so the optimal measure is supported on the graph of the Lipschitz function 
(x, y) = {F^ (u) , F~ (u)) := ( ; ) • By projecting onto the diagonal, 

we obtain a measure z/ on A that pushes forward to fi~^\u+ and fi~\u- under 
the Lipschitz mappings F"*" and F~ , respectively. Now, as F~^ is Lipschitz, 
the image of any zero volume set must also have zero volume; as ^~^\u+ is 
absolutely continuous with respect to Lebesgue, z/ must be as well; we will 
write u = h{u)du. Now, for almost every x G there is a unique y = T{x) 
such that {x,y) G spt(7) and hence a unique u = on the diagonal such 
that X = F^{u). It follows that the map is one to one almost everywhere 
and so for every set A C A we have J^h{u)du = Jp+(^j^^ f~^{x)dx. But 
the right hand side is f^ {F^ {u))\detD F^ {u)\du by the area formula; as A 
was arbitrary, this means h{u) = f~^ {F~^ {u))\detDF~^ {u) \ almost everywhere. 
Similarly, h{u) = f~{F~{u))\detDF'{u)\ almost everywhere, hence 

/+(F+(u))|detDF+(n)| = f- {F~ {u))\detDF- {u)\ 

almost everywhere. As the image under F~^ of a negligible set must itself be 
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negligible, we have 



rix)\deWF^{{F+)-\x))\ = f-{F-{{F^)-\x)))\detDF-{{F-^)-\x))\ 

(6) 

for almost all x. Note that as F~^ is one to one almost everywhere and 
F^{{u G A :detDF^{u) = 0}) has measure zero by the area formula, 
is differentiable almost everywhere. As T o F^ = F~, it follows that T is 
differentiable almost everywhere and 

detDT{F+{u))detDF+{u) = detDF-{u) 

whenever and F~ are differentiable at u and T is differentiable at 
Hence, 

detDT{x)detDF+{{F+)-\x)) = detDF' {{F+)-\x)) (7) 

for all X such that T is differentiable at x and F"*" and F~ are differen- 
tiable at {F~^)~^{x). T is differentiable for almost every x , and F~ 
are differentiable for almost every u and F+ is Lipschitz; it follows that 
the above holds almost everywhere. Now, combining (6) and (7) we obtain 
f+{x) = \detDT{x)\f-{T{x)) for almost every x. 

□ 

Remark 4.1. Note that the preceding proposition does not require that con- 
tinuity ofT extend outside dom(T). Thus it applies to T = Du, for example, 
where u is an arbitrary convex function and dom{T) is its domain of differ- 
entiability. 
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